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Abstract 

Background: The ability to predict protein-protein binding sites has a wide range of applications, including signal 
transduction studies, de novo drug design, structure identification and connparison of functional sites. The interface in 
a complex involves two structurally matched protein subunits, and the binding sites can be predicted by identifying 
structural matches at protein surfaces. 

Results: We propose a method which enumerates "all" the configurations (or poses) between two proteins (3D 
coordinates of the two subunits in a complex) and evaluates each configuration by the interaction between its 
components using the Atomic Contact Energy function. The enumeration is achieved efficiently by exploring a set of 
rigid transformations. Our approach incorporates a surface identification technique and a method for avoiding clashes 
of two subunits when computing rigid transformations. When the optimal transformations according to the Atomic 
Contact Energy function are identified, the corresponding binding sites are given as predictions. Our results show that 
this approach consistently performs better than other methods in binding site identification. 

Conclusions: Our method achieved a success rate higher than other methods, with the prediction quality improved 
in terms of both accuracy and coverage. Moreover, our method is being able to predict the configurations of two 
binding proteins, where most of other methods predict only the binding sites. The software package is available at 
http://sites.google.com/site/guofeics/dobi for non-commercial use. 



Background 

Most of the existing efforts to identify the binding sites 
in protein-protein interaction are based on analyzing the 
differences between interface residues and non-interface 
residues, often through the use of machine learning or 
statistical methods. These methods differ in the features 
analyzed, that is, the sequence and structural or physical 
attributes. Chung et al. [1] used multiple structure align- 
ments of the individual components in known complexes 
to derive structurally conserved residues. Sequence profile 
and accessible surface area information are combined with 
the conservation score to predict protein-protein bind- 
ing sites by using a Support Vector Machine. Ofran et al. 
[2] employed neural networks to predict binding sites, 
using the sequence environment, the profile and the struc- 
tural features as input. The random forest algorithm is 
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used to utilize these features from sequences or 3D struc- 
tures for the binding site prediction [3,4]. PSIVER [5] uses 
sequence features for training a Na/ve Bayes classifier to 
predict binding sites. In PSIVER, conditional probabili- 
ties of each sequence feature are estimated using a kernel 
density estimation method. 

Besides the machine learning and statistical approaches, 
3D structural algorithms and other methods have also 
been used to identify binding sites through investigat- 
ing protein surface structures. ProBiS [6] predicts binding 
sites by local surface structure alignment. It compares 
the query protein to 3D protein structures in a database 
to detect proteins with structurally similar sites on the 
surfaces. Burgoyne et al, [7] analyzed clefts in protein 
surfaces that are likely to correspond to the binding 
sites. They ranked them according to sequence conserva- 
tion and simple measures of physical properties includ- 
ing hydrophobicity, desolvation, electrostatic and van der 
Waals potentials. Ortuso et al, [8] defined most relevant 
interaction areas in complexes deriving pharmacophore 
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models from 3D structure information. It is based on 3D 
maps computed by the GRID program on structurally 
known molecular complexes. 

ProMate [9] is based on the idea of interface and non- 
interface circles. A circle is first created around each 
residue. Then, features are extracted from these circles. 
Statistics are performed and histograms are created for 
each feature. Thereafter, the probability for each circle of 
a test protein to be an interface is estimated. The interface 
circles are clustered for each test protein to identify the 
binding patch. 

Bradford et al, [10] proposed an approach (PPI-Pred) 
which uses SVM (Support Vector Machine) on surface 
patch features to predict binding sites. PPI-Pred gener- 
ates an interacting patch and a non-interacting patch for 
each protein. Seven features are extracted for each patch 
to build an SVM model, which is then used to predict if a 
given test patch is an interacting patch. 

In PINUP [11], an empirical scoring function is pre- 
sented to predict binding sites. The function is a lin- 
ear combination of energy score, interface propensity 
and residue conservation score. A patch is formed by a 
residue and its spatial neighbors within the protein sub- 
unit. PINUP takes the top 5% scoring patches and ranks 
residues based on their occurrences in these patches. 
The top 15 ranked residues are predicted as the interface 
residues. 

Li et al. [12] proposed another SVM approach (core- 
SVM). The residues of the proteins are divided into four 
classes: the interior residues, the core interface residues, 
the rim interface residues, and the non-interface residues. 
The core interface and rim interface residues are distin- 
guished by the percentage of their neighboring residues 
which are interface residues. An SVM is built over eight 
features extracted from the interface residues, and used 
to compute the probability of whether a residue is a core 
interface residue. 

Meta-servers have also been constructed to com- 
bine the strengths of existing approaches. The pro- 
gram called meta-PPISP [13] combines three individual 
servers, namely cons-PPISP, ProMate and PINUP; another 
program called metaPPI [14] combines five prediction 
methods, namely PPI-Pred, PINUP, PPISP, ProMate, and 
SPPIDER [15]. 

Another approach in binding site prediction is to exam- 
ine the possible structural configurations, or referred 
to as poses, of protein subunits, that is, how the sub- 
units may dock. Docking methods based on fast Fourier 
transformation (FFT) [16,17], geometric surface match- 
ing [18], as well as intermolecular energy [19-21] have 
been proposed. Fernindez-Recio et al. [22] simulated 
protein docking and analyzed the interaction energy land- 
scapes. Their method uses a global docking method 
based on multi-start global energy optimization of the 



ligand. It explores the conformational space around the 
whole receptor, and uses the rigid-body docking config- 
urations to project the docking energy landscapes onto 
the surfaces. The low-energy regions are predicted as the 
binding sites. 

In this paper, we propose a method which enumer- 
ates the configurations of two binding proteins (that 
is, the possible positions of the two subunits in a 
complex), and identify binding sites by evaluating the 
interaction between the components using the Atomic 
Contact Energy (ACE) function [23]. We perform rigid 
transformation to enumerate the configurations of two 
binding proteins. The enumeration is performed in con- 
junction with a surface identification technique for avoid- 
ing clashes between protein subunits when computing 
rigid transformations. The transformations which result 
in the minimum score according to the Atomic Contact 
Energy function are found; the corresponding interact- 
ing residues are reported as binding sites. Our method is 
implemented in a program called DoBi^. 

We perform experiment to compare DoBi with the exist- 
ing methods using commonly used measures for assess- 
ments. The program outperforms the other methods on 
these measures. DoBi achieved a success rate higher 
than all the other methods, improving prediction qual- 
ity in terms of both accuracy and coverage. In addition, 
it predicts the configurations of two binding proteins, as 
opposed to giving only the binding sites. 

Methods 

The main idea of our method is to enumerate "all" configu- 
rations between two proteins, where a configuration refers 
to the 3D coordinates representing the relative position 
and orientation of two protein subunits in a complex. We 
use the Atomic Contact Energy (ACE) function to com- 
pute the score for a configuration. The configurations with 
the lowest score are chosen, and the corresponding inter- 
acting residues are predicted as binding sites. We use rigid 
transformation to enumerate the configurations. The key 
techniques required here contain (1) an efficient algorithm 
to enumerate "all" configurations (rigid transformations) 
and (2) a good energy score. 

Atomic contact energy 

Atomic Contact Energy (ACE) is an atomic desolvation 
energy measure developed in [24]. It is defined over the 
energy of replacing a protein-atom/water contact, with a 
protein-atom/protein-atom contact. The ACE score takes 
into account 18 atom types, hence resulting in 18x 18 pos- 
sible atom pairs. The score for each atom pair has been 
determined, based on a statistical analysis of atom-pairing 
frequencies in known proteins. These pre-determined 
scores are given as log likelihood values in [24], thus allow- 
ing the summation of these values. The pre-determined 
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score of effective contact energy between atom type / and 
type / is defined as 



(M-,o/Q,o) X (Nj,o/Cj,o) 

where type 0 corresponds to the solvent. The number of 
contact {Ni^j) and the number of i-O contact (A//,o) are 
estimates of the actual contact numbers of known com- 
plexes. In addition, and Q,o are defined as the expected 
numbers of i-j contact and i-O contact. 

For a given configuration, the ACE score is a summation 
of each of the atom pairs (one from each subunit) within 
threshold distance d, and d = 6A is used in this paper. 
Denote the sets of atoms from the two subunits as Si and 
5^2, respectively, then the ACE is computed as 

Eace= J2 

seSi,teS2,\\s-t\\<d 

where ||5 — ^|| is the Euclidean distance between 5 and 
and T[Sft] is the pre-determined score of the atom pair s 
and t 

The ACE score can be considered an estimate of the 
change in desolvation energy of the two proteins in going 
from the unbound state to the complex. A lower ACE 
value implies a lower (and hence more favorable) desolva- 
tion free energy. 

Enumeration of the configurations 

In this paper, we assume that subunits are rigid. A pro- 
tein structure consists of a sequence of residues. Each 
residue consists of a set of atoms. We assume that the 
atoms in a residue are ordered as a sequence. Hence, the 
whole protein structure can be represented by a sequence 
of atoms. In the rest of this subsection, we let A and B 
denote two protein structures (subunit), and write A = 
{cLi, a2, . . . , byn)> and B = (bi,b2> . . . > bn), where Ui, and bj 
are atoms of structure A and B, Without loss of general- 
ity, we assume that n > m. We also assume that we know 
the 3D coordinates of each atom in both input proteins. 
We useA[i : /] to denote the subsequence (at, . . . , aj), and 
refer to a subsequence of atoms as a structural fragment 

To enumerate all the configurations, we assume B is 
fixed, and we perform rotations and translations (referred 
to as rigid transformations, and simply, transforma- 
tions, in the rest of the paper) on A. The method pro- 
posed here is modified from the algorithms for structure 
comparison [25]. 

Assume that two points at and aj of A interact with two 
points bi' and bj' of By then we know that \ \ai — bi' \\ < d 
and \ \aj — bf\\ < d. To enumerate the configurations, we 
enumerate the positions for atoms ai and aj first, and for 
each fixed positions of ai and aj^ we rotate A about the 



line formed by ai and aj. Let the d-ball of an atom a be 
the ball with radius d centered at a. We discretize the d- 
ball of bif with step size ed^ where 6 is a small constant 
(and we choose 6 = 0.1 for this paper). Each grid point in 
the (i-ball of bi^ is used as a candidate position for atom at 
for the binding. When at is fixed at one of the grid points, 
the possible positions for aj form a sphere cap, where the 
sphere is centered at at with radius \ \ai — aj\\, and the cap 
is the portion of the spheres enclosed in the d-hsll of bf. 
Again, we discretize the sphere cap with step size ed. Each 
grid point on the sphere cap is a candidate position for aj. 
This gives us a total of 0((^)^) possible positions for the 
pair of ai and aj. After ai and aj are fixed on their respec- 
tive grid points, the only degree of freedom to move A[ /,/] 
is to rotate it around the axis through ai and aj. We use 
a 1° step size; that is, we explore 360 different positions 
for the remaining atoms through 360 rotations. Figure 1 
illustrates the steps to compute a transformation. 

The method will work well if we know two interaction 
pairs (ai, bi') and icij, bf). We can simply enumerate all the 
atoms pairs as the interaction pair candidate. However, 
there will be 0{n^) such cases, which makes the computer 
program too slow in practice. This is perhaps one of the 
reasons that such a method has not been tried. The focus 
of the following subsection is to identify two pairs {ai, bi') 
and (cLj, bj') which are more likely to be interaction pairs. 

When enumerating "all" configurations, we also want to 
make sure that (1) only surface fragments can be candidate 
binding sites for a configuration and (2) there is no clash 
between the two proteins in such a configuration. Before 
presenting the details of the method, we define the surface 
atoms and clashes of two subunits first. 

Surface atoms 

The interface residues of two proteins are necessarily sur- 
face residues. Inspired by the work in LIGSITE^^^ [26,27], 
we propose a method to identify the surface atoms of a 
protein. 

First, we build a 3D grid with step size lA around the 
protein. Then, each grid point is labeled as a protein point 
if it is within distance 2A of any atom, and labeled as empty 
otherwise. We further subdivide the protein grid points 
into two types: interior or surface, A protein grid point is 
labeled as surface if at least one of its six neighboring grid 
points is empty, otherwise it is labeled as interior. With 
the grid points labeled, we can label the atoms, an atom 
is labeled as a surface atom if it is within distance 1.5 A of 
a surface grid point, otherwise it is labeled as an interior 
atom. 

Figure 2 gives an example in 2D, where a protein grid 
point is labeled as interior if it has all four neighbors 
as protein points. In 3D, a protein grid point should be 
labeled as interior if all of its six neighbors are labeled 
as protein. 



Guo etal. BMC Bioinformatics 201 2, 13:1 58 
http://www.bionnedcentral.conn/1 471 -21 05/1 3/1 58 



Page 4 of 25 




Clashes of two subunits 

A configuration cannot result in two subunits to have 
clashes. The following method is used to capture if a con- 
figuration resulted in clashes. Given a configuration, we 
build a 3D grid as in the previous subsection. For each of 



the structures A and B, we mark the grid points as inte- 
rior, surface, or empty. We use a threshold 0 to identify 
whether two subunits clash, by calculating the proportion 
of interior points for both of them. We say that the two 
subunits clash if they share more than 6 x 100% of their 
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Figure 2 The surface atoms are indicated in 2D. (A) the grid is created, and grid points are labeled as either empty or protein; (B) the grid points 
labeled as protein are relabeled as surface or interior; (C) an atom is labeled either as surface or as interior. We use 2D as an illustration. 
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interior points; that is, if X is the number of interior grid 
points which are shared by both proteins, and Xa and Xb 
are the number of interior grid points of each subunit, 
respectively, then we require thatX < ^ x min{XA,XB} if 
the subunits do not clash. 

Finding the two interaction pairs 

In the following subsections, we present the details to 
explore the potential interaction pairs. 

Identify candidate fragment pairs 

We first select fragment pairs that are potential binding 
sites. As discussed in Section "Enumeration of the config- 
urations", there are 0(n^) possible fragment pairs {ui^ai') 
and {bj, bf) for each binding site. To reduce the computa- 
tional complexity, we adopt a local alignment algorithm to 
accelerate this selection. This is a raw estimation and we 
hope that the actual binding sites are not discarded by this 
process. 

We first use a heuristic to quickly discard fragments 
pairs that are unlikely to bind. The heuristic simplifies the 
problem, as follows: (1) every atom is within the thresh- 
old value required in the ACE computation (that is, we 
ignore the geometry of the structure); (2) each atom inter- 
acts with at most one atom; (3) interacting pairs follows 
a sequential order. That is, for any two pairs of interacted 
atoms [ui, hi') and {aj, bjf), we have either / < f and ; < /, 
or f < i and / < With these three simplifications, 
the standard Smith-Waterman local alignment algorithm 
[28] can be employed, with the ACE scores used as the 
penalty (negation of the score) for alignment. We use a 
penalty of 1 for aligning an atom to a space. Each local 
aligned segment gives us two fragments, where each atom 
in the fragment is either aligned to another atom from the 
partner, or aligned to nothing (i.e., aligned to space). 

We present details here. For two sequences Pi and 
an alignment of Pi and P2 can be obtained by (1) insert- 
ing spaces into the two sequences Pi and P2 such that 
the two resulting sequences with inserted spaces P[ and 
P2 have the same length and (2) overlap the two result- 
ing sequences P[ and 7^2- The score of the alignment is the 
sum of the scores for all the columns, where each column 
has a pair of letters (including spaces) and for each pair of 
letters there is a pre-defined score. A subsequence a of Pi 
and a subsequence of P2 can be formed as a local aligned 
segment such that the score between a and P is minimum. 
Here we want to find all (non-overlapping) pairs of subse- 
quences with a score of at most x. For our purpose, we set 
X = 0 throughout the paper. 

Due to the simplifications, there are many false pos- 
itive results, and some of the interaction pairs can be 
filtered. The latter issue can be handled to some extend 
by raising the threshold. The former issue is tackled by 
further refinement in the next subsection. In practice, our 



program outputs 70 to 120 fragment pairs as potential 
binding sites, which is much smaller than O(w^), where 
the number of atoms « in a protein is from 500 to a few 
thousands. 

Since a binding site is necessarily on the surface of a 
subunit, we filter out fragments with only very few atoms 
on the surface. To achieve this, we use a sliding window 
of length 15 to parse the aligned fragment pair. For each 
window, if the surface atoms are at least 2/3 (that is, ten 
atoms) for both fragments, the fragment pair of this win- 
dow is kept for further processing and this fragment pair 
is extracted from the alignment. We continue this pro- 
cess on the un-extracted portion of the alignment. If the 
window does not contain sufficient surface atoms, we con- 
tinue at the next window. Our choice of 2/3 comes from 
observations with a docking decoy set from the Dock- 
ground [29], where 94% of the binding sites have more 
than 2/3 of surface atoms. 

Identify configurations of fragment pairs 

From the fragment pairs obtained in the previous step, 
a second step is used to further filter out fragment pairs 
of ACE scores below a threshold. Given two structural 
fragments A [/, 7] = {au^.^aj), and^[/^/]= (bi^y ...fbf)^ 
we assume that ai interacts with and aj interacts with 
bf. Using the enumeration method described earlier, we 
enumerate different configurations for A and B and com- 
pute the corresponding ACE score for the atom sets A [ /,;] 
and B[ i\f]. We do not consider any configuration which 
causes A and B to clash. In this step, a pair of structural 
fragment which does not give any configuration with an 
ACE score below a specified threshold is discarded. In 
this paper, we define the threshold value as 400, since the 
ACE scores of actual interface in the docking decoy set 
from Dockground are all less than 400. After this step, it is 
unlikely for two protein structures which cannot be bound 
to have an unfiltered fragment pair. 

Identify the configuration for the two subunits 

In the third step, for each pair of protein structures with 
at least one remaining fragment pair, we enumerate all 
the potential configurations for the structures. We want to 
use the begin and end atoms of the identified fragments 
for our choice of (ui, bi') and {ujy bf) in the enumeration, 
since these are the atoms that are likely to be interact- 
ing. Assuming that there are k fragment pairs from the 
same two proteins left after the filtration of the second 
step, we will have a maximum of 2k distinct atom pairs to 
choose. Thus, there is a total of at most (2^) combinations 
to consider for the choice of (ui, bi^) and {ajy bf). 

When the best configuration is obtained, two residues, 
one from each subunit, are reported as the inter- 
face residues if they can be connected with a pair of 
atoms within distance 4.5A. In our search for the best 
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Table 1 Details of DoBI on the training set 
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{%) is the F-score of our method on the receptor proteins. 
•^F/ (%) is the F-score of our method on the ligand proteins. 



Table 2 Comparison of DoBi and Fernandez-Recio etal/s method 
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Fernandez-Recio etal/s 
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39.3 


72.7 0.51 
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40.0 



^Suc (%) is the success rate of the corresponding method on the data set. 
•^Acc (%) is the average accuracy of the corresponding method on the data set. 
•^Cov (%) is the average coverage of the corresponding method on the data set. 

is the average of the sizes predicted by the corresponding method on the data set. 
^1/ is the standard deviation of the sizes predicted by the corresponding method on the data set. 
^F is the F-score of the corresponding method on the data set. 
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Table 3 Detailed Results of DoBi and Fernandez-Redo eta/.'s method 
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1wq1(G:R) 


Iwer 
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33.0 


5p21 


26 


77.8 


80.8 


40.8 


53.0 
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Table 3 Detailed Results of DoBI and Fernandez-Recio etal.'s method (Continued) 



1 bth(H:P) 


2hnt 


30 


15.2 


16.7 


27.7 


61.0 


6pti 


17 


94.1 


94.1 


32.5 


39.0 


1fin(A:B) 


Ihcl 


46 


35.5 


47.8 


28.3 


68.0 


1 vin 


35 


32.8 


60.0 


66.7 


100 


1fq1(B:A) 


1b39 


16 


63.2 


75.0 


8.2 


32.0 


Ifpz 


16 


63.2 


75.0 


0 


0 



^PDB is the unbound structure of the receptor or ligand in the complex. 
^Intn is the number of residues on the actual interface in the complex. 
Mcc {%) is the accuracy of the corresponding method on the data set. 
^Cov (%) is the coverage of the corresponding method on the data set. 
^The values for this method are from literature [22]. 



configuration, we also require the configurations to be free 
from clashes. 



threshold values. The details on the training set are shown 
in Table 1. 



Results and discussion 

Three commonly used measures are utilized to assess 
the performance of DoBi. Accuracy and Coverage are two 
common measures to assess the quality of the binding 
sites adopted by a method [11]. The accuracy of the 
predicted interface is the fraction of correctly predicted 
residues over the total number of predicted interface 
residues; the coverage of the predicted interface is the 
fraction of correctly predicted interface residues over the 
total number of actual interface residues. F-score {F = 
2 X tc^mcy+Sr^P is a weighted average of the accu- 
racy and coverage, where an F-score reaches its best 
score at 1 and worst score at 0. Another common mea- 
sure is success rate^ which is defined in [9]. A reported 
result is claimed as a success if at least half of the pre- 
dicted residues are actual interface residues; that is, the 
accuracy is no less than 50%. The success rate is the frac- 
tion of successful predicted cases in the total number of 
predicted proteins. 

A protein complex may contain several subunits, and 
multiple binding sites. Each binding site in a protein com- 
plex consists of a pair of subunits. Two residues in a pair 
of subunits are called interface residues if any two atoms, 
one from each residue, interact. By interact, we mean the 
distance between the two atoms is less than the sum of the 
van der Waals radius of the two atoms plus 1 A. The num- 
ber of residues on interface is referred to as the interface 
size. 

Training set 

We use the unbound protein structures from Dockground 
[29] as the training set to calculate the parameters of DoBi. 
The docking decoys from Dockground were generated 
by GRAMM-X scan. The GRAMM-X docking scan was 
used to generate 102 unbound-unbound complexes and 
131 unbound-bound complexes. By excluding the proteins 
used in the comparison, 36 unbound-unbound complexes 
and 80 unbound-bound complexes can be used to calcu- 
late the value of the threshold 6. When we set 0 = 0.17, 
the overall F-score of DoBi on the training set is 60.5%, 
which is the best score that DoBi achieves under different 



Comparison to the existing methods 

We divide our comparisons into four separate groups, 
where in each group we compare a different set of meth- 
ods. The reason that we cannot compare all the methods 
with the same data set is due to the unavailability of some 
methods, in which case the only comparison possible is 
with the results in the respective publications. 

Comparison to Fernandez-Recio et al/s method 

DoBi is compared to the method introduced by 
Fernandez-Recio et al in [22], using the test data therein, 
which consists of 43 complexes. The results are reported 
in Table 2. The overall accuracy and coverage for DoBi 
are 44.3% and 70.5%. Fern^indez-Recio et aUs method 
achieved the overall accuracy and coverage of 39.3% and 
72.7%, respectively. The success rate for DoBi is 39.6%, 
improving over the success rate of 37.2% reported by 
Fern^indez-Recio et al. The F-score is 0.54 for DoBi, and 
0.51 for Fernandez-Recio etaVs method. 

The average predicted sizes for DoBi and Fernindez- 
Recio et al/s method are 37.5 residues and 46.3 residues 
respectively, while the average actual size is 21.1 residues. 
The standard deviation of the sizes predicted by DoBi is 
29.0, while that of the sizes predicted by Fern^indez-Recio 
et al/s method is 40.0. 

Table 3 displays the detailed results for all unbound 
structures of 43 complexes. Each row corresponds to a 
pair of proteins. We can observe from the table that the 
binding sites are identified accurately for the complexes 
2sni(E:I), 2sic(E:I), lay7(A:B) and lwql(G:R). 

Comparison to metaPPl, meta-PPISP and PPI-Pred 

In this group of our comparisons, the test set in [14] is 
used. It consists of 41 complexes from the benchmark v2.0 
[30] and 27 targets from the CAPRI experiment [31]. The 
41 complexes are divided into two categories, enzyme- 
inhibitor (EI) and others. We compare our method to 
metaPPI, meta-PPISP and PPI-Pred with this group of 
data. The overall accuracy and coverage of each prediction 
method are shown in Table 4. DoBi has an F-score of 0.55, 
where in contrast, metaPPI, meta-PPISP and PPI-Pred 



Table 4 Comparisons of DoBi, metaPPI, meta-PPISP and PPI-Pred 
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^Suc (%) is the success rate of the corresponding method on the data set. 
•^Acc (%) is the average accuracy of the corresponding method on the data set. 
^Cov (%) is the average coverage of the corresponding method on the data set. 
^E-l is the type of enzyme-inhibitor. 

is the average of the sizes predicted by the corresponding method on the data set. 
V is the standard deviation of the sizes predicted by the corresponding method on the data set. 

is the F-score of the corresponding method on the data set. 
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Table 5 Detailed Results of DoBI, metaPPI, meta-PPISP and PPI-Pred on 41 complexes 
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9.1 


5.3 


5.6 


8.3 


16.7 


52.6 


IczpA 


17 


51.6 


94.1 


57.1 


42.1 


63.2 


63.2 


50.0 


41.2 


1f34(A:B) 


4pep_ 


25 


44.8 


52.0 


30.8 


12.5 


30.3 


52.6 


47.5 


76.0 


1f32A 


24 


57.9 


45.8 


72.7 


24.2 


55.2 


69.6 


70.4 


79.2 


1 mah(A:F) 


1j06B 


27 


35.9 


51.9 


16.7 


3.4 


28.0 


63.6 


36.6 


96.3 


1fsc_ 


21 


86.4 


90.5 


15.8 


15.0 


33.3 


21.9 


33.3 


28.6 


lppe(E:l) 


lbtp_ 


27 


64.9 


88.9 


64.3 


42.9 


40.9 


42.8 


0 


0 


lluOA 


14 


63.2 


85.7 


92.3 


75.0 


100 


56.0 


90.0 


64.3 


1tnnq(A:B) 


1jae_ 


28 


62.2 


82.1 


75.0 


40.0 


36.0 


30.0 


63.4 


92.9 


IbluA 


26 


57.1 


76.9 


93.3 


56.0 


70.4 


76.0 


0 


0 


1udi(E:l) 


1udh_ 


26 


52.2 


46.2 


63.6 


25.9 


48.0 


66.7 


72.0 


69.2 


2ugiB 


26 


94.4 


65.4 


92.9 


56.5 


72.7 


80.0 


85.7 


46.2 


2pcc(A:B) 


1ccp_ 


13 


20.0 


23.1 


53.8 


50.0 


26.7 


33.3 


0 


0 


1ycc_ 


14 


26.3 


35.7 


42.9 


35.3 


37.5 


33.3 


13.3 


14.3 


2sic(E:l) 


lsup_ 


26 


50.0 


46.2 


72.7 


38.1 


81.8 


60.0 


62.5 


76.9 


3ssi_ 


12 


84.6 


91.7 


0 


0 


100 


72.2 


0 


0 


2sni(E:l) 


lubnA 


27 


66.7 


59.3 


60.0 


33.3 


60.0 


83.0 


66.7 


81.5 


2ci2l 


15 


42.9 


40.0 


57.1 


57.1 


0 


0 


76.9 


66.7 


7cei(A:B) 


lunkD 


20 


76.9 


50.0 


75.0 


35.3 


47.4 


60.0 


75.0 


45.0 


1m08B 


16 


64.3 


56.3 


40.0 


37.5 


0 


0 


13.8 


25.0 


others 










































1ak4(A:D) 


2cpL 


17 


42.9 


35.3 


50.0 


31.3 


33.3 


18.8 


59.1 


76.5 


1e6jP 


9 


30.4 


77.8 


0 


0 


0 


0 


0 


0 


latn(A:D) 


lijjB 


17 


5.3 


5.9 


0 


0 


20.7 


37.5 


0 


0 


3dni_ 


24 


40.0 


33.3 


0 


0 


0 


0 


66.7 


66.7 


1 b6c(A:B) 


1d6oA 


20 


54.3 


95.0 


83.3 


55.6 


40.0 


11.1 


93.3 


70.0 


liasA 


20 


44.0 


55.0 


54.5 


25.0 


31.6 


25.0 


0 


0 


1 buh(A:B) 


IhcL 


16 


68.4 


81.3 


0 


0 


6.3 


11.8 


0 


0 


IdksA 


18 


75.0 


83.3 


58.3 


38.9 


36.4 


22.2 


100 


66.7 


1e96(A:B) 


1mh1_ 


14 


66.7 


85.7 


38.5 


25.0 


46.2 


60.0 


10.0 


14.3 


1hh8A 


12 


73.3 


91.7 


41.7 


35.7 


45.5 


35.7 


0 


0 


lfql(A:B) 


IfpzF 


16 


63.2 


75.0 


0 


0 


0 


0 


0 


0 


1b39A 


16 


63.2 


75.0 


0 


0 


30.0 


23.1 


17.1 


37.5 


lfqj(A:B) 


ItndC 


21 


20.7 


81.0 


70.6 


42.9 


32.3 


35.7 


28.6 


38.1 


IfqiA 


24 


18.9 


58.3 


90.9 


47.6 


42.9 


14.3 


78.9 


62.5 


lgcq(B:C) 


IgriB 


14 


35.3 


42.9 


70.0 


63.6 


38.9 


63.6 


22.2 


14.3 


IgcpB 


18 


78.9 


83.3 


60.0 


40.0 


100 


33.3 


33.3 


16.7 


1ghq(A:B) 


1c3d_ 


10 


41.7 


100 


0 


0 


42.9 


37.5 


0 


0 


1ly2A 


9 


47.4 


100 


0 


0 


42.9 


66.7 


8.7 


22.2 


1gm(A:B) 


1a4rA 


17 


54.2 


76.5 


33.3 


15.0 


40.0 


40.0 


50.0 


58.8 


1rgp_ 


22 


50.0 


54.5 


16.7 


4.5 


100 


13.6 


78.9 


68.2 


1h1v(A:G) 


lijjB 


24 


28.6 


41.7 


46.2 


13.0 


35.3 


26.1 


38.8 


76.0 


IdOnB 


25 


43.8 


56.0 


0 


0 


40.0 


4.9 


4.7 


12.0 



T5 O 

Si 
IS 
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Table 5 Detailed Results of DoBI, metaPPI, meta-PPISP and PPI-Pred on 41 complexes (Continued) 



1he1(C:A) 


1nnh1_ 


16 


48.0 


75.0 


66.7 


30.8 


50.0 


42.3 


0 


0 


1he9A 


21 


40.9 


42.9 


76.5 


46.4 


33.3 


7.1 


0 


0 


1 he8(B:A) 


821 P_ 


13 


20.6 


100 


0 


0 


43.8 


33.3 


26.7 


61.5 


1e8zA 


15 


11.1 


53.3 


42.9 


16.7 


5.9 


5.6 


0.6 


6.7 


1i2m(A:B) 


1qg4A 


24 


14.3 


33.3 


42.9 


21.4 


43.8 


50.0 


15.0 


12.5 


1a12A 


32 


15.1 


43.8 


0 


0 


50.0 


5.1 


48.0 


75.0 


1ibr(A:B) 


1qg4A 


35 


43.2 


45.7 


73.3 


22.0 


55.0 


22.0 


14.3 


8.6 


1f59A 


42 


38.9 


33.3 


7.1 


1.8 


0 


0 


10.3 


16.7 


1 kac(A:B) 


InobF 


15 


68.4 


86.7 


0 


0 


15.4 


21.1 


0 


0 


lf5wB 


21 


83.3 


95.2 


60.0 


28.6 


71.4 


23.8 


35.3 


28.6 


1 ktz(A:B) 


1tgk_ 


9 


26.7 


44.4 


45.5 


62.5 


13.3 


25.0 


50.0 


88.9 


lnn9zA 


12 


57.1 


100 


66.7 


80.0 


60.0 


60.0 


33.3 


50.0 


1 kxp(A:D) 


lijjB 


34 


13.6 


8.8 


81.3 


30.2 


45.5 


23.3 


4.3 


5.9 


1kw2B 


41 


32.0 


19.5 


0 


0 


75.0 


13.0 


48.9 


56.1 


1 kxq(H:A) 


IkxqH 


25 


12.1 


16.0 


91.7 


30.6 


78.6 


30.6 


18.2 


8.0 


1ppi_ 


30 


22.7 


16.7 


41.7 


17.9 


20.0 


3.6 


47.8 


73.3 


lmlO(A:B) 


lauq_ 


24 


57.1 


50.0 


58.3 


24.1 


65.0 


44.8 


50.0 


45.8 


1 mozB 


29 


68.0 


58.6 


0 


0 


31.6 


18.2 


0 


0 


1qa9(A:B) 


1hnf_ 


16 


76.2 


100 


0 


0 


27.3 


17.6 


10.0 


12.5 


IcczA 


16 


82.4 


87.5 


6.7 


5.3 


22.2 


10.5 


28.6 


25.0 


1sbb(A:B) 


1bec_ 


13 


54.2 


100 


0 


0 


17.6 


17.6 


0 


0 


1se4_ 


11 


50.0 


100 


0 


0 


50.0 


12.5 


10.0 


27.3 


1wq1(R:G) 


6q21D 


26 


61.5 


61.5 


66.7 


32.3 


41.7 


32.2 


76.2 


61.5 


1wer_ 


33 


62.5 


45.5 


100 


26.5 


36.4 


11.8 


70.0 


63.6 


2btf(A:P) 


lijjB 


26 


63.3 


73.1 


53.3 


32.0 


25.0 


12.0 


22.0 


42.3 


1 pne_ 


23 


56.0 


60.9 


0 


0 


70.0 


28.0 


0 


0 



^PDB is the unbound structure of the two proteins in complex. 
^Intn is the number of residues on actual interface in complex. 
Mcc (%) is the accuracy of the corresponding method on the data set. 
^Cov (%) is the coverage of the corresponding method on the data set. 
^E-l is the type of enzyme-inhibitor. 

^The values for metaPPI and meta-PPISP are from literatures [1 4]. 

^The results for PPI-Pred are calculated by using the same definition of actual interface with DoBi. 

^The binding sites between chain E and chain I of 1 acb are predicted by each method; Two unbound structures are chain B of 2cga and the only one chain of 1 egl. 



Table 6 Detailed Results of DoBI, metaPPI, meta-PPISP and PPI-Pred on 27 targets 



Protein 1 Protein 2 



Complex DoBi metaPPI'^ meta-PPISP'^ PPI-Pred'^ DoBi metaPPI meta-PPISP PPI-Pred 







Acc^ 


Cov' 


Acc 


Cov 


Acc 


Cov 


Acc 


Cov 




Acc 


Cov 


Acc 


Cov 


Acc 


Cov 


Acc 


Cov 


T01 


11 


46.2 


54.5 








83.3 


62.5 








13 


38.9 


53.8 








0 


0 








T02 


7 


24.1 


100 








72.2 


43.3 








6 


21.4 


100 








0 


0 








T03 


10 


12.0 


30.0 








60.0 


75.0 








15 


32.0 


53.3 








19.6 


18.0 








T04 


19 


50.0 


89.5 


0 


0 


58.3 


38.9 


2.4 


3.6 


18 


37.5 


100 


64.3 


40.9 


0 


0 


71.4 


68.2 


T05 


20 


29.2 


35.0 


0 


0 


52.6 


33.3 


4.8 


9.1 


17 


14.3 


35.3 


90.0 


39.1 


4.5 


7.7 


38.9 


30.4 


T06 


23 


28.6 


34.8 


71.4 


29.4 


39.1 


27.3 


59.5 


73.5 


29 


38.1 


27.6 


28.6 


15.4 


25.8 


66.7 


4.5 


3.8 


T07 


15 


52.9 


60.0 


33.3 


30.8 


33.3 


30.8 


0 


0 


11 


15.4 


18.2 


7.7 


5.6 


5.6 


4.3 


0 


0 


T08 


25 


37.9 


44.0 


0 


0 


9.5 


8.3 


0 


0 


23 


64.0 


69.6 


30.0 


11.5 


0 


0 


7.9 


11.5 


T09 


37 


90.5 


51.4 


80.0 


20.0 


0 


0 


25.8 


20.0 


37 


76.7 


62.2 


45.5 


12.5 


0 


0 


16.1 


12.5 


T10 


46 


40.0 


47.8 








10.0 


47.4 








53 


50.0 


49.1 








0 


0 








Til 


12 


50.0 


91.7 


86.7 


59.1 








45.8 


50.0 


28 


71.9 


82.1 


81.8 


50.0 








56.5 


72.2 


T12 


12 


16.7 


25.0 


93.8 


62.5 


61.5 


30.8 


45.5 


41.7 


28 


86.4 


67.9 


55.6 


33.3 


36.0 


45.0 


22.2 


13.3 


T13 


10 


33.3 


100 








0 


0 








8 


44.4 


100 








72.0 


85.7 








T14 


53 


52.2 


22.6 


10.0 


2.3 


6.8 


33.3 


8.6 


7.0 


63 


42.3 


17.5 


50.0 


13.2 


13.5 


19.2 


2.0 


2.6 


T15 


23 


95.0 


82.6 


0 


0 


63.2 


50.0 


5.0 


11.1 


19 


81.0 


89.5 


15.8 


33.3 


56.5 


72.2 


9.1 


11.1 


T16 











55.6 


21.7 


87.0 


74.1 


0 


0 











100 


29.0 


25.0 


53.8 


61.8 


67.7 


T17 











0 


0 


23.1 


12.5 


0 


0 











92.9 


65.0 


0 


0 


33.3 


45.0 


TIB 


24 


53.6 


62.5 


85.7 


50.0 


42.9 


36.0 


46.2 


50.0 


31 


50.0 


35.5 


0 


0 


52.2 


36.4 


2.1 


3.4 


T19 


12 


68.8 


91.7 








33.3 


28.0 








12 


45.0 


75.0 








69.2 


62.1 








T20 


47 


53.6 


31.9 


94.4 


37.8 


23.8 


90.9 


28.6 


22.2 


35 


72.2 


37.1 


72.2 


36.1 


34.3 


54.5 


23.2 


63.9 


T21 


17 


73.7 


82.4 


0 


0 


0 


0 


3.0 


6.7 


15 


55.6 


66.7 


0 


0 


33.3 


20.8 


0 


0 


T22 


17 


22.7 


29.4 


9.1 


6.7 


28.6 


17.4 


0 


0 


12 


71.4 


83.3 


83.3 


41.7 


6.2 


5.9 


60.0 


75.0 


T23 


49 


95.6 


87.8 


64.3 


17.0 


18.2 


53.3 


66.0 


62.3 


49 


95.3 


83.7 


64.3 


17.0 


0 


0 


66.0 


62.3 


T24 


3 


13.3 


66.7 


66.7 


66.7 






50.0 


73.3 


1 


5.6 


100 


0 


0 






50.0 


61.5 


T25 








100 


68.2 


20.0 


23.5 


81.8 


81.8 








58.3 


31.8 


73.9 


77.3 


55.6 


90.9 


T26 


34 


43.8 


41.2 


75.0 


27.3 


20.8 


33.3 


0 


0 


24 


61.5 


66.7 


21.4 


12.5 


18.2 


60.0 


18.2 


8.3 


T27 


7 


43.8 


87.5 


0 


0 


0 


0 


6.7 


22.2 


8 


50.0 


91.7 


20.0 


22.2 


0 


0 


0 


0 



^Intn is the number of residues on actual interface in complex. 
Mcc (%) is the accuracy of the corresponding method on the data set. 
^Cov (%) is the coverage of the corresponding method on the data set. 
*^The values for these methods are from literatures [1 0,1 4]. 
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Figure 3 Configuration discovered by DoBi for 1qa9(A:B). (A) is the figuration by DoBi; and (B) is tine experimental structure. Tine C« iRMSD 
between two complexes is 2.36A. 



have the F-scores 0.35, 0.43 and 0.32 respectively. DoBi 
has a success rate of 53.7%, as well as overall accuracy and 
coverage of 50.0% and 60.0% respectively. 

The detailed results on all the unbound structures of 
the 41 complexes are displayed in Table 5. The detailed 
results on 27 CAPRI targets are displayed in Table 6. Each 
row displays the results of the methods tested on the two 
corresponding binding partners. 

Besides the identification of binding sites, our program 
also estimates the orientations and positions of the pro- 
teins after binding. Figure 3 displays the orientation and 
position discovered by our program for lqa9(A:B). The 
interface RMSD (root mean squared deviation) (iRMSD) 
between the experimental structure and the predicted 
complex is 2.36A. 

Comparison to ProMate and PINUP 

In this experiment, DoBi is compared to ProMate and 
PINUP. The test data is originally used by ProMate, and 
consists of 57 non-homologous proteins. The results are 
reported in Table 7. DoBi has an F-score of 0.56, while 
PINUP and ProMate have the F-scores 0.43 and 0.21 
respectively. The overall accuracy and coverage of DoBi 
are 54.2% and 59.1%. The success rate of DoBI is 64.9%. 
Hence the success rate is improved by at least 1.8%, while 
the overall accuracy and coverage are improved by at least 
1.7% and 16.6% respectively. 

The average of the sizes predicted by DoBi, PINUP and 
ProMate are 23.5 residues, 19.0 residues and 5.4 residues 
respectively, while the actual average size (average size of 



actual interface residues) is 21.0 residues. The number of 
residues correctly predicted to be on interface by DoBi, 
PINUP and ProMate are 12.3 residues, 8.3 residues and 
2.7 residues respectively. 

Table 8 shows the detailed results of 57 unbound pro- 
teins. DoBi performed better for most of the cases. How- 
ever, for some cases where all three methods do not 
perform well, DoBi is usually the worst, e.g. lavu_, laye_, 
IqqrA and IbleA. 

Comparison to core-SVM 

In this study, we compare DoBi to core-SVM using the 
same data set of 50 dimers which core-SVM was tested 
against [12]. The results are shown in Table 9. The over- 
all accuracy and coverage for our method are 59.0% and 
61.1%, while those for core-SVM are 53.4% and 60.6%. The 
success rate of DoBi is 70.0% on 50 pairs of proteins in 
those binary complexes. The F-score is 0.60 for DoBi, and 
0.56 for core-SVM. The average of the size predicted by 
DoBi is 39.0 residues (with standard deviation 19.1), while 
the average actual size is 40.3 residues. The number of 
residues correctly predicted by DoBi to be on the interface 
is 22.5. 

Table 10 shows the details for DoBi on the data set used 
by core-SVM. The performance of DoBi is particularly 
good on several proteins such as laym2 and IrzhM. 

Evaluation on benchmark v4.0 

To further evaluate our method, we perform tests on 
the protein-protein docking benchmark v4.0 [32,33]. This 



Table 7 Comparison to PINUP and ProMate 







DoBi 










PINUP 








ProMate 






Suc^ 


Acc^ 


Cov^ pf 






Sue 


Acc 


Cov F 


M 


V Sue 


Acc 


Cov F 


M 


V 


Overall 64.9 


54.2 


59.1 0.56 


23.5 


10.5 


42.1 


44.9 


42.5 0.43 


19.0 


8.7 63.1 


52.5 


13.2 0.21 


5.4 


16.8 



^Suc (%) is the success rate of the corresponding method on the data set. 
•^Acc (%) is the average accuracy of the corresponding method on the data set. 
^Cov (%) is the average coverage of the corresponding method on the data set. 

is the average of the sizes predicted by the corresponding method on the data set. 
^1/ is the standard deviation of the sizes predicted by the corresponding method on the data set. 
is the F-score of the corresponding method on the data set. 
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Table 8 Detailed Comparison to PINUP and ProMate 



PDB" Complex Intn^ DoBi PINUP^ ProMate* 









Acc*^ 


Gov** 


Acc 


Cov 


Acc 


Cov 


a19A 


1 brs(A:D) 


16 


86.7 


81.3 


72.2 


81.3 


100 


29 


a2pA 


1 brs(D:A) 


19 


76.2 


84.2 


63.6 


73.7 


90 


19 


a5e_ 


1 bi7(B:A) 


30 


82.1 


76.7 


41.2 


23.3 


88 


10 


acL 


ltss(A:B) 


25 


36.7 


72.0 


35.9 


56.0 


24 


14 


ag6_ 


2pcf(A:B) 


24 


65.0 


54.2 


56.3 


37.5 


70 


16 


aje_ 


1ann4(D:A) 


18 


57.1 


22.2 


60.0 


33.3 


72 


30 


ajw_ 


1 ccO(E:A) 


9 


50.0 


88.9 


66.7 


66.7 


73 


24 


aueA 


lfap(B:A) 


8 


58.3 


87.5 


15.8 


37.5 


90 


35 


avu_ 


1 avw(B:A) 


15 


30.0 


40.0 


66.7 


93.3 


100 


29 


aye_ 


1dtd(A:B) 


22 


42.1 


36.4 


44.4 


54.5 


54 


24 


bleA 


1 a4y(B:A) 


32 


38.7 


37.5 


88.2 


46.9 


69 


24 


bip_ 


ltnnq(B:A) 


29 


66.7 


55.2 


61.1 


37.9 


100 


27 


ctm_ 


2pcf(B:A) 


21 


62.1 


85.7 


38.1 


38.1 


100 


12 


cto_ 


1cd9(B:A) 


6 


40.0 


33.3 


35.3 


100 


36 


29 


cye_ 


1 eay(A:B) 


16 


55.6 


62.5 


5.6 


6.3 


0 


0 


dOnA 


1 cOf(S:A) 


27 


46.2 


44.4 


0 


0 


67 


3 


d2bA 


1 uea(B:A) 


19 


66.7 


52.6 


78.6 


57.9 


92 


31 


ekxA 


1 d09(A:B) 


21 


64.5 


95.2 


0 


0 


0 


0 


ex3A 


1cgi(E:l) 


33 


61.1 


33.3 


68.2 


45.5 


100 


29 


ezBA 


1dn1(B:A) 


18 


88.9 


44.4 


47.1 


44.4 


100 


6 


eza_ 


3eza(A:B) 


21 


64.0 


76.2 


0 


0 


0 


0 


eztA 


1agr(E:A) 


22 


57.1 


54.5 


22.2 


18.2 


54 


1 3 


fOOl 


1f02(l:T) 


17 


31.6 


35.3 


0 


0 


0 


0 


fSwA 


1 kac(B:A) 


21 


71.4 


71.4 


25.0 


23.8 


100 


6 


fkL 


1 b6c(A:B) 


19 


54.5 


63.2 


75.0 


47.4 


100 


20 


flzA 


leui(A:C) 


25 


42.9 


96.0 


77.3 


68.0 


52 


19 


fvhA 


1dn1(A:B) 


42 


51.4 


45.2 


53.3 


38.1 


0 


0 


g4kA 


1 uea(A:B) 


30 


46.2 


40.0 


43.8 


23.3 


78 


21 


gc7A 


1 ef 1 (A:C) 


18 


71.4 


55.6 


28.6 


1 1.1 


78 


6 


gnc_ 


lcd9(A:B) 


15 


43.7 


46.7 


21 .4 


20.0 


6 


2 


hhSA 


1 e96(B:A) 


14 


50.0 


35.7 


44.0 


78.6 


50 


2 


hplA 


1 eth(A:B) 


1 9 


20.0 


36.8 


8.7 


10.5 


7 


3 


hu8A 


1ycs(A:B) 


8 


37.5 


75.0 


31.6 


75.0 


5 


2 


iob_ 


1 ;4-U / A n\ 

1 itb(A:B) 


38 


38.1 


21 .1 


46.7 


1 8.4 


31 


6 


j6zA 


1 cOf(A:S) 


29 


28.2 


75.9 


34.6 


31.0 


0 


0 


jae_ 


1tmq(A:B) 


32 


60.0 


65.6 


83.3 


46.9 


50 


13 


lba_ 


1aro(L:P) 


16 


8.6 


18.8 


40.0 


37.5 


60 


24 


nobA 


1 kac(A:B) 


15 


50.0 


73.3 


0 


0 


7 


3 


nos_ 


1 noc(A:B) 


9 


33.3 


444 


0 


0 


0 


0 


pco_ 


1 eth(B:A) 


15 


77.8 


46.7 


16.7 


20.0 


60 


12 


pne_ 


1hlu(P:A) 


25 


65.7 


92.0 


93.8 


60.0 


0 


0 


poh_ 


1ggr(B:A) 


10 


57.1 


40.0 


72.7 


80.0 


0 


0 


PPP- 


1stf(E:l) 


29 


79.3 


79.3 


47.4 


31.0 


91 


30 
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Table 8 Detailed Comparison to PINUP and ProMate (Continued) 



IqqrA 


IbmKCA) 


7 


33.3 


28.6 


38.5 


71.4 


85 


32 


1rgp_ 


1am4(A:D) 


16 


55.0 


68.8 


36.8 


43.8 


50 


5 


IselA 


1cse(E:l) 


29 


75.0 


93.1 


60.9 


48.3 


61 


27 


1vin_ 


lfin(B:A) 


29 


40.0 


34.5 


50.0 


51.7 


0 


0 


1wer_ 


1wq1(G:R) 


33 


67.7 


63.6 


70.6 


36.4 


0 


0 


1xpb_ 


1jtg(A:B) 


32 


69.2 


56.3 


89.5 


53.1 


0 


0 


2bnh_ 


1a4y(A:B) 


38 


38.5 


39.5 


37.8 


36.8 


100 


4 


2cpL 


lak4(A:D) 


17 


61.9 


76.5 


78.6 


64.7 


76 


23 


2f3gA 


1ggr(A:B) 


18 


50.0 


50.0 


64.7 


61.1 


100 


12 


2nef_ 


1avz(B:A) 


10 


56.3 


90.0 


30.8 


40.0 


57 


24 


2rgf_ 


1 lfd(A:B) 


14 


52.4 


78.6 


27.8 


35.7 


20 


5 


3ssi_ 


2sic(l:E) 


15 


80.0 


80.0 


68.2 


100 


100 


24 


6ccp_ 


2pcb(A:B) 


9 


23.5 


44.4 


28.6 


66.7 


0 


0 


Bound"^ 


1jtg(B:A) 


32 


81.1 


93.8 


65.0 


40.6 


94 


22 



^PDB is tlie unbound structure of tlie predicted protein. 
^Intn is tlie number of residues on actual interface in complex. 
Mcc {%) is the accuracy of the corresponding method on the data set. 
^Cov (%) is the coverage of the corresponding method on the data set. 

^The unbound structure of 1 jtgB was not available in PDB, and we used the bound structure instead. 
^The values for ProMate are from literature [9]. 

^The results for PINUP are calculated by using the same definition of actual interface with DoBi. 



benchmark consists of 176 complexes. Proteins dynami- 
cally change their conformations upon binding with other 
proteins [34]. A single protein without binding with any 
other structure is referred to as unbound, whereas a pro- 
tein with a binding partner in a complex is referred to as 
bound. We test our method in both the bound and the 
unbound cases. 

Running time 

We used a Pentium(R) 4 (CPU of 3.40GHz) to run DoBi. 
The computation for each of the 176 complexes took 100 
seconds on average. 

Results on bound states 

The complexes are classified into broad biochemical cate- 
gories: Enzyme-Inhibitor (52), Antibody- Antigen (25) and 



Others (99). The average accuracy and coverage of DoBi 
are 61.8% and 67.9% respectively on the 52 complexes in 
Enzyme-Inhibitor, 51.6% and 70.1% on the 25 complexes 
in Antibody- Antigen, and 58.2% and 69.1% on the 99 com- 
plexes in Others. A success rate of 77.6% is achieved for 
the Enzyme-Inhibitor complexes. The details are shown in 
Table 11. 

Results on unbound states 

The pairs of unbound proteins are classified into three cat- 
egories: 121 rigid-body (easy) cases, 30 medium difficult 
cases, and 25 difficult cases, according to the magnitude 
of conformational change after binding [30]. The average 
accuracy and coverage of DoBi are 43.6% and 65.4% on the 
121 rigid-body cases, 34.1% and 56.7% on the 30 medium 
difficult cases, and 32.4% and 53.4% on the 25 difficult 



Table 9 Comparison to core-SVM 









DoBi 












core-SVM9 








Suc^ 


Acc'' 


Cov^ 








Sue 


Acc 


Gov F 


M 


V 


Overall 


70.0 


59.0 


61.1 


0.60 


39.0 


19.1 




53.4 


60.6 0.56 







^Suc (%) is the success rate of the corresponding method on the data set. 
^Acc (%) is the average accuracy of the corresponding method on the data set. 
'^Cov (%) is the average coverage of the corresponding method on the data set. 

is the average predicted size for DoBi on the data set. 
^1/ is the standard deviation of predicted size for DoBi on the data set. 

is the F-score of the corresponding method on the data set. 
^The values for core-SVM are from literature [1 2]. 
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Table 1 0 Detailed Results for DoBI on the data set used by core-SVM 



Protein ID 


Partner ID 








Acc^ 


Cov« 


1a9xA 


1a9xB 


59 


52 


95 


54.7 


88.1 


1a9xB 


1a9xA 


52 


47 


88 


53.4 


90.4 


layml 


1aynn3 


46 


38 


41 


92.7 


82.6 


1aym2 


layml 


57 


54 


70 


77.1 


94.7 


1aym3 


layml 


43 


33 


36 


91.7 


76.7 


IblxA 


IblxB 


21 


15 


33 


45.5 


71.4 


IfzcB 


IfzcC 


45 


38 


58 


65.5 


84.4 


1g4yR 


1g4yB 


29 


5 


18 


27.8 


17.2 


1gk8A 


1gk8l 


49 


28 


55 


50.9 


57.1 


IhlrB 


IhlrA 


33 


9 


14 


64.3 


27.2 


IhSeC 


lh8eD 


69 


37 


67 


55.2 


53.6 


1h8eD 


1h8eC 


35 


19 


39 


48.7 


54.3 


1 hxs4 


Igxsl 


31 


21 


35 


60.0 


67.7 


lirdB 


lirdA 


23 


20 


32 


62.5 


86.9 


lj34A 


lj34B 


43 


19 


22 


86.4 


44.1 


IjboB 


IjboA 


36 


16 


29 


55.2 


44.4 


IjsdA 


IjsdB 


51 


18 


20 


90.0 


35.3 


IjsdB 


IjsdA 


67 


26 


42 


61.9 


38.8 


IkSnA 


IkSnB 


35 


24 


56 


42.9 


68.6 


IkSnB 


IkSnA 


25 


16 


39 


41.0 


64.0 


1ld8A 


1ld8B 


35 


23 


28 


82.1 


65.7 


ImtyB 


ImtyD 


58 


22 


34 


64.7 


38.1 


ImtyD 


ImtyB 


31 


10 


15 


66.7 


32.2 


1 mtyG 


ImtyD 


41 


18 


42 


42.9 


43.9 


1n4qB 


1n4qA 


25 


5 


15 


33.3 


20.0 


1p2jA 


1p2jl 


23 


18 


36 


50.0 


78.2 


1p2jl 


1p2jA 


14 


13 


21 


61.9 


92.9 


IqopA 


IqopB 


35 


32 


52 


61.5 


91.4 


IqopB 


IqopA 


34 


31 


51 


60.8 


91.2 


IrthA 


IrthB 


57 


32 


68 


47.0 


56.1 


IrthB 


IrthA 


58 


33 


69 


47.8 


56.9 


IrypB 


IrypA 


31 


13 


24 


54.1 


41.9 


IrzhH 


1 rzhM 


37 


8 


16 


50.0 


21.6 


IrzhL 


1 rzhM 


48 


42 


45 


93.3 


87.5 


IrzhM 


IrzhL 


51 


45 


48 


93.8 


88.2 


1s5dD 


1s5dA 


4 


4 


29 


13.7 


100 


ItugA 


ItugB 


17 


14 


39 


35.9 


82.4 


ItugB 


ItugA 


12 


9 


24 


37.5 


75.0 


1tx4B 


1tx4A 


25 


18 


34 


52.9 


72.0 


1 uvqA 


luvqB 


61 


35 


39 


89.7 


57.4 


1 uvqB 


luvqA 


55 


26 


31 


83.9 


47.2 


1 we3F 


1 we3T 


12 


10 


48 


20.8 


83.3 


1 wf4o 


1wf4a 


10 


10 


19 


52.6 


100 
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Table 10 Detailed Results for DoBI on the data set used by core-SVM (Continued) 



2ltnA 


2ltnB 


55 


12 


16 


75.0 


21.8 


2ltnB 


2ltnA 


47 


17 


17 


100 


36.2 


3pcgA 


BpcgM 


41 


12 


15 


80.0 


29.3 


SpcgM 


3 peg A 


40 


11 


21 


52.4 


27.5 


4ubpA 


4ubpC 


24 


8 


43 


18.6 


33.3 


4ubpC 


4ubpB 


46 


26 


86 


30.2 


56.5 


8rucl 


BrucA 


38 


29 


38 


76.3 


76.3 



^Intn is the number of residues on actual interface in complex. 

^Cn is the number of residues correctly predicted to be on interface by our method. 

^Pn is the number of total residues predicted to be on interface by our method. 

"^Acc (%) is the accuracy of our method on the data set. 

^Cov {%) is the coverage of our method on the data set. 



Table 1 1 DoBI's performance for proteins of benchmark v4.0 in bound states 



Type^ 


No. of complexes 


Sue'' 


Acc^ 


Cov^ 






Enzyme-Inhibitor 


52 


77.6 


61.8 


67.9 


22.6 


6.3 


Antibody-Antigen 


25 


56.0 


51.6 


70.1 


19.3 


6.5 


Others 


99 


66.7 


58.2 


69.1 


24.0 


10.8 


Overall 


176 


68.2 


57.5 


68.9 


22.9 


9.3 



^Type is based on the broad biochemical categories. 
''Sue {%) is the success rate of DoBi on the data set. 
*^Acc (%) is the average accuracy of DoBi on the data set. 
*^Cov (%) is the average coverage of DoBi on the data set. 

is the average of the sizes predicted by DoBi on the data set. 
V is the standard deviation of the sizes predicted by DoBi on the data set. 



Table 12 DoBI's performance for proteins of benchmark v4.0 In unbound states 



Subset" 


Type'' 


No. of cases 


Suc^ 


Acc^ 


Cov« 


IW 




Rigid body 


Enzyme-Inhibitor 


40 


51.2 


48.9 


66.9 


37.1 


34.1 




Antibody-Antigen 


22 


50.0 


51.0 


67.8 


24.0 


14.6 




Others 


59 


32.2 


37.3 


63.5 


39.9 


36.9 




Subtotal 


121 


41.7 


43.6 


65.4 


36.1 


31.9 


Medium difficult 


Enzyme-Inhibitor 


7 


39.9 


36.7 


56.2 


25.9 


17.4 




Antibody-Antigen 


1 


0 


31.9 


41.4 


38.0 


9.2 




Others 


22 


31.2 


33.4 


56.7 


52.9 


56.7 




Subtotal 


30 


31.6 


34.1 


56.7 


46.1 


45.9 


Difficult 


Enzyme-Inhibitor 


5 


37.5 


43.1 


46.5 


26.1 


7.0 




Antibody-Antigen 


2 


0 


29.5 


54.6 


27.3 


17.5 




Others 


18 


10.5 


30.5 


54.8 


54.9 


44.8 




Subtotal 


25 


13.9 


32.4 


53.4 


46.9 


35.1 


Overall 




176 


36.0 


40.4 


62.2 


39.3 


36.9 



^Subset is based on the magnitude of conformational change after binding. 
'^Type is based on the broad biochemical categories. 
■^Suc (%) is the success rate of DoBi on the data set. 
■^Acc (%) is the average accuracy of DoBi on the data set. 
^Cov (%) is the average coverage of DoBi on the data set. 

is the average predicted size for DoBi on the data set. 
9 1/ is the standard deviation of predicted size for DoBi on the data set. 
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Table 1 3 The Docking Results of DoBi, ZDOCK and 3D-Dock on CAPRI 



Target DoBiiooo ZDOCK DoBiio 3D-Dock^ 





iRMSD^ 


NC'' 


F/^ 


F/ 


iRMSD 


NC 


F/ 


Fr 


iRMSD 


NC 


F/ 


Fr 


iRMSD 


NC 


T1 


4.28 


27.6 


56.0 


45.2 


8.10 


17.2 


50.0 


32.0 


5.45 


44.0 


74.1 


64.3 


3.0 


46 


T2 


6.23 


76.9 


38.8 


35.3 


4.15 


46.2 


51.9 


35.7 


8.27 


53.8 


48.0 


364 


— 


— 


T3 


18.48 


94 


17.1 


43.9 


3.89 


62.5 


64.0 


60.6 


18.51 


12.0 


22.9 


514 






T4 


3.98 


63.5 


66.6 


57.1 


4.50 


23.1 


78.2 


58.3 


6.24 


35.9 


38.3 


51.6 


15.1 


21 


T5 


11.06 


7.7 


46.8 


31.6 


10.08 


54 


76.6 


18.9 


11.06 


7.7 


46.8 


31.6 






T6 


16.49 


15.4 


364 


33.4 


8.72 


29.2 


54.2 


71.6 


19.21 


9.6 


18.2 


28.1 


0.8 


86 


T7 


11.10 


13.5 


62.8 


24.0 


643 


2.7 


44.4 


4.8 


11.10 


13.5 


62.8 


24.0 


28.6 


14 


T8 


6.69 


37.9 


42.7 


60.9 


2.73 


63.6 


82.8 


60.0 


6.69 


37.9 


42.7 


60.9 


1.7 


33 


T9 


2.85 


33.3 


61.3 


67.6 


8.46 


28.9 


54.1 


58.7 


10.54 


1.4 


36.7 


37.7 


9.7 


23 


TIG 


4.52 


28.9 


504 


51.8 


14.75 


5.9 


15.4 


17.3 


7.69 


13.0 


58.1 


59.3 


34.8 


0 


Til 


2.55 


66.7 


68.5 


75.0 


2.63 


61.1 


96.0 


82.1 


12.17 


0 


0 


45.0 


1.9 


20 


T12 


2.55 


66.7 


68.5 


75.0 


2.31 


81.5 


75.9 


88.9 


12.17 


0 


0 


45.0 


3.2 


22 


T13 


3.33 


94.1 


74.1 


69.6 


2.49 


57.1 


52.9 


59.3 


3.33 


94.1 


74.1 


69.6 


6.4 


6 


T14 


19.98 


9.6 


34.5 


28.0 


5.22 


42.0 


72.7 


68.9 


20.97 


10.3 


36.1 


28.3 


0.9 


47 


T15 


2.40 


53.6 


86.9 


83.0 


0.86 


91.1 


90.6 


81.8 


4.00 


42.0 


64.2 


63.6 






T18 


8.08 


25.0 


57.7 


444 


1.88 


66.2 


80.0 


80.0 


11.38 


8.2 


10.3 


19.7 


9.4 


14 


T19 


2.74 


58.8 


60.0 


69.0 


9.81 


4.8 


40.0 


14.6 


2.74 


58.8 


60.0 


69.0 


3.9 


31 


T20 


15.13 


1.1 


14.7 


28.6 


13.62 


7.2 


35.0 


37.1 


15.13 


1.1 


14.7 


28.6 






T21 


2.02 


50.0 


77.8 


68.8 


2.43 


70.7 


83.3 


70.6 


2.02 


50.0 


77.8 


68.8 






T22 


16.08 


7.5 


20.0 


71.4 


9.28 


12.6 


66.7 


0 


16.08 


7.5 


20.0 


714 






T23 


1.90 


61.2 


86.9 


884 


2.14 


72.1 


87.3 


87.9 


3.14 


46.0 


83.1 


83.1 






T24 


5.01 


50.0 


31.6 


20.0 


28.15 


0 


0 


0 


5.01 


50.0 


31.6 


20.0 






T26 


7.11 


29.6 


26.1 


45.2 


30.07 


0 


0 


0 


7.11 


29.6 


26.1 


45.2 






T27 


6.95 


60.0 


424 


51.9 


15.89 


3.5 


24.4 


0 


7.38 


66.7 


38.5 


50.0 






T29 


2.46 


68.6 


83.3 


79.3 


3.90 


58.6 


77.4 


72.1 


3.80 


32.7 


69.4 


77.8 







^Ca iRMSD between the configuration by the respective method and the experimental structure. 

^NC (%) is fraction of native contacts for each method. 

^Fi (%) is the F-score of each method for the ligand protein on the data set. 

^Fr (%) is the F-score of each method for the receptor protein on the data set. 

^The values for 3D-Dock are from literatures [36,37]; The blank results mean that 3D-Dock never produced on these targets. 
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Table 14 The Docking Results of DoBI and ZDOCK on Benchmark v4.0 



DoBiiooo ZDOCK DoBiiooo ZDOCK DoBiiooo ZDOCK 



PDB 


iRmsd^ 






iRrnsd 


F/ 


Fr 


PDB 


iRrnsd 


F/ 


Fr 


iRrnsd 


F/ 


Fr 


PDB 


iRrnsd 


F/ 


Fr 


iRrnsd 


F/ 


Fr 


Ibvk 


1.24 


71.8 


72.7 


1.72 


71.4 


80.0 


Ijps 


4.27 


66.7 


62.8 


2.26 


78.3 


82.6 


Igia 


6.51 


77.3 


72.4 


3.76 


70.3 


72.0 


2sni 


1.49 


92.9 


82.8 


2.55 


90.0 


78.3 


lyvb 


4.44 


71.0 


51.3 


1.61 


82.4 


91.3 


lacb 


6.55 


78.8 


78.0 


2.61 


93.8 


82.6 


lj2j 


1.52 


80.0 


83.9 


2.18 


66.7 


56.4 


lavx 


4.54 


66.7 


70.2 


1.67 


73.3 


88.5 


2i25 


6.57 


46.2 


68.6 


1.40 


80.0 


72.0 


1 wql 


1.60 


88.5 


76.9 


1.82 


77.6 


69.2 


Ifql 


4.54 


62.9 


76.5 


8.05 


42.4 


50.0 


IzOk 


6.60 


72.7 


55.2 


2.29 


90.3 


75.0 


1rv6 


1.68 


80.0 


88.2 


1.43 


86.7 


83.3 


1e6e 


4.58 


67.9 


60.9 


1.11 


85.0 


85.7 


1fc2 


6.88 


59.5 
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Table 1 4 The Docking Results of DoBi and ZDOCK on Benchmark v4.0 (Continued) 
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^Ca iRMSD between the configuration by methods and the experimental structure. 
^Vj (%) is the F-score of each method for the ligand protein on the data set. 
'^Fr (%) is the F-score of each method for the receptor protein on the data set. 




Figure 5 Configuration discovered by DoBi and ZDOCK for 1i4d. (A) is the configuration by DoBi; (B) is the configuration by ZDOCK; (C) is the 
experimental structure. 
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cases. The success rate of DoBi is 41.7% for the rigid-body 
cases, which is significantly better than for the other two 
categories. In general, the accuracy and coverage decrease 
as the magnitude of conformational increases. The details 
are shown in Table 12. 

DoBi discovered several good configurations for the 
medium difficult cases. One of the instances is lwql(R:G). 
Its configuration discovered by DoBi is shown in Figure 4. 
The Ca iRMSD between the experimental structure and 
the predicted complex is 4.12A. 

Docking result of DoBi 

DoBi is optimized for binding site prediction, but it also 
can be used to dock two protein structures. We com- 
pare DoBi s poses to the best configurations obtained by 
ZDOCK and 3D-Dock. ZDOCK [35] uses a fast Fourier 
transform to search all possible binding modes for the 
proteins, and evaluates them based on shape comple- 
mentarity, desolvation energy, and electrostatics. It can 
produce structures with the smallest iRMSD values in top 
1000 predictions with minimum energy. 3D-Dock [36,37] 
uses an initial grid-based shape complementarity search 
to produce lots of potential interacting conformations. 
They can be ranked by using interface residue propensi- 
ties and interaction energies. It reports structures with the 
smallest iRMSD values in top ten predictions. 

We calculate the predicted structures by different meth- 
ods on the complexes in benchmark v4.0 and the targets 
in CAPRI. CAPRI is a community-wide experiment to 
assess the capacity of protein docking methods to predict 
protein-protein interactions [31]. The Ca iRMSD, F-score 
and the fraction of native contacts are used to evaluate 
the results by different methods. The fraction of native 
contacts is used by 3D-Dock[37]. It is calculated as the 
total number of native contacts for the predicted config- 
uration divided by the total number of contacts in the 
native structure. A native contact exists between residues 
i and j if distances between them in native structure and 
in predicted configuration are both less than 4. 5 A. 

We compare the docking results of DoBi, ZDOCK 
and 3D-DOCK on the CAPRI targets. The results are 
shown in Table 13. The top 1,000 configurations pre- 
dicted by DoBi and ZDOCK are used for comparison. 
Among the 1,000 predictions, we choose the configura- 
tion of the best iRMSD value to evaluate the methods. 
The average iRMSD values for DoBi and ZDOCK are 
7.5A and 6.9A, respectively. However, the average frac- 
tions of native contacts for DoBi and ZDOCK are 40.6% 
and 35.2%, respectively. DoBi improves the F-score of 
binding site prediction by at least 1.3%. DoBi's perfor- 
mance on docking is worse than ZDOCK, but its per- 
formance on binding site prediction is more accurate 
than ZDOCK. 



Each of DoBi and 3D-Dock produced ten results for 
each target, and the configurations with smallest iRMSD 
values among those ten predictions are used for compar- 
ison. The average iRMSD values for DoBi and 3D-Dock 
are 9.2 A and 9.1 A. However, the overall fractions of native 
contacts for DoBi and 3D-Dock are 29.1% and 26.8%. 
DoBis performance on binding site prediction is better 
than that of 3D-Dock. 

The docking results obtained by DoBi and ZDOCK on 
Benchmark v4.0 are shown in Table 14. Similarly, we com- 
pare the best configurations in the top 1000 predictions 
from each method of DoBi and ZDOCK for each tar- 
get. The average iRMSD values of DoBi and ZDOCK are 
6.1 A and 4.9 A, respectively. For the binding site predic- 
tion, the overall F-score values of ligand proteins by DoBi 
and ZDOCK are 69.5% and 69.4%, and those of recep- 
tor proteins by DoBi and ZDOCK are 68.2% and 66.1%, 
respectively. These results indicate that DoBi's perfor- 
mance on binding site prediction is better than ZDOCK. 
The docking quality of DoBi requires further efforts to 
improve. 

We calculate the docking results of li4d. The Ca 
iRMSD values between the experimental structure and 
the configurations by DoBi and ZDOCK are 2.97A and 
1.97A, respectively. DoBi improves F-score value of ligand 
protein by 2.7%, and that of receptor protein by 0.4%. 
The configurations produced by methods are shown in 
Figure 5. 

Factors affecting the performance of DoBi 

We notice that DoBi performed badly on a few spe- 
cific instances. We analyze this performance issue 
with Table 15, which compares the ACE scores for the 
experimental structures and predicted complexes, for the 
bound states of proteins in the benchmark v4.0. Among 
the 176 complexes, only 43 of them have an ACE score for 
experimental structures lower than that of the predicted 
complexes. This implies that in 133 cases, DoBi is able 
to find a configuration of a lower score than the experi- 
mental structures. These anomalies suggest that the score 
function currently used in DoBi may be inaccurate, and 
this inaccuracy may have contributed to the poorly per- 
formed cases of DoBi. We also note that the search space 
currently explored by our method is incomplete, and this 
may have contributed as well to the inaccuracy of DoBi in 
some cases. 

Figure 6 shows the protein complex incorrectly pre- 
dicted by DoBi as well as the experimental structure for 
lkxq(H:A). The iRMSD between the two complexes is 
18.87 A. The ACE score of the docking structure predicted 
by DoBi, -497.6, is lower than the ACE score of the exper- 
imental structure, 63.7. The binding sites predicted by 
DoBi are incorrect as well. 



Table 15 Comparison of Atomic Contact Energy for the Predicted Complexes and the Experimental Structures on Benchmark v4.0 
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2mta(L:A) 


-55.7 


70.5 


77.4 


80.0 


2nz8(A:B) 


52.1 


36.3 


66.6 


77.1 


1jiw(P:l) 


110.3 


-628.9 


53.9 


66.7 


2b42(A:B) 


103.6 


-199.4 


24.0 


23.6 


1 nw9(B:A) 


-120.1 


-333.9 


77.3 


80.0 


1e96(A:B) 


110.7 


-120.4 


66.6 


60.6 


2b4j(A:C) 


94.0 


-120.6 


53.8 


66.7 


1eaw(A:B) 


12.0 


-173.1 


23.2 


74.3 


2c0l(A:B) 


130.2 


-225.3 


76.7 


78.1 
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-50.4 
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66.6 


41.0 
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66.7 


2pcc(A:B) 


47.3 


98.9 
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30.3 
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-14.3 
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76.5 


52.2 
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-194.7 


64.8 


57.1 
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52.7 


63.2 
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73.6 
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38.7 


1ahw(B:C) 


262.7 


388.1 


64.7 


80.0 


1ibr(A:B) 


234.0 
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30.2 
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75.9 
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85.3 
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64.5 


85.7 
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51.1 
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115.0 


-60.2 


20.4 


38.9 
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75.6 


83.3 


1ay7(A:B) 


123.2 


-30.3 


64.5 
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107.0 


-227.5 


46.5 


48.9 


2hqs(A:H) 


190.9 


-202.6 


10.5 


22.6 



^Eact is ACE score for the experimental structure on the data set. 

^Epre is ACE score for the prediction complex on the data set. 

^Fr {%) is the F-score of our method for the receptor protein on the data set. 

*^F/ (%) is the F-score of our method for the ligand protein on the data set. 
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Conclusions 

In this work, we proposed an approach to identify binding 
sites in protein complexes by docking protein subunits. 
The method is implemented in a program called DoBi. 
DoBi consistently and significantly performed better than 
existing techniques in predicting binding sites in experi- 
mental results. 

We identify a few potential areas for future improve- 
ments to our method. The first area to work on is in the 
energy function used. Currently, DoBi uses a simple score 
function. As suggested by the experiment results, a bet- 
ter energy function is able to improve the performance of 
DoBi. 

A second area for improvement is in our current 
assumption that protein structures are rigid when bind- 
ing. In reality, protein structures may vary sightly or even 
dramatically when they bind. Hence, further studies on 
this issue are very much in demand. 

Although our method shows better overall perfor- 
mance, there are some protein complexes where other 
methods outperformed DoBi. It will be beneficial if we 
could combine the strengths of these existing programs 
with DoBi, to come up with a more reliable method. 

Endnote 

^The initial two letters from each of the two words. Dock- 
ing and Binding, were taken. 
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